A Formalization and Proof of the 
Extended Church-Turing Thesis 



— Extended Abstract — 



nachum. dershowitzOcs . tau. ac . il 



Nachum Dershowitz 

School of Computer Science 
Tel Aviv University 
Tel Aviv, Israel 



Evgenia Falkovich* 

School of Computer Science 
Tel Aviv University 
Tel Aviv, Israel 

jenny . falkovichOgmail . com 



We prove the Extended Church-Turing Thesis: Every effective algorithm can be efficiently simulated 
by a Turing machine. This is accomplished by emulating an effective algorithm via an abstract state 
machine, and simulating such an abstract state machine by a random access machine, representing 
data as a minimal term graph. 

1 Introduction 

The Church-Turing Thesis asserts that all effectively computable numeric functions are recursive and, 
likewise, they can be computed by a Turing machine, or — more precisely — can be simulated under some 
representation by a Turing machine. This claim has recently been axiomatized and proven (3j|6l. The 
"extended" thesis adds the belief that the overhead in such a simulation is polynomial. One formulation 
of this extended thesis is as follows: 

The Extended Church-Turing Thesis states . . . that time on all "reasonable" machine models 
is related by a polynomial. (Ian Parberry (9l) 

We demonstrate the validity of this thesis for all (sequential, deterministic, non-interactive) effective 
models over arbitrary constructive domains in the following manner: 

1 . We adopt the axiomatic characterization of (sequential) algorithms over arbitrary domains due to 
Gurevich flU (Section El Definition □). 

2. We adopt the formalization of effective algorithms over arbitrary domains from (Section |2j 
Definition |U). 

3. We adopt the definition of simulation of algorithms in different models of computation given in Q. 

4. We consider implementations, which are algorithms operating over a specific domain (Section |2j 
Definition 13. 

5. We represent domain elements by their minimal constructor-based graph (dag) representation; 
cf. CLQ (Section©. 

6. We measure the size of input as the number of vertices in a constructor-based representation (Sec- 
tion |3 Definition [6]). 

*This work was carried out in partial fulfillment of the requirements for the Ph.D. degree of the second author. 
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7. We emulate effective algorithms step-by-step by abstract state machines [|8] in the precise manner 
of Q (Section H Section H. 

8. Each basic implementation step can be simulated in a linear number of random-access machine 
(RAM) steps (Section [51 Theorem [B]). 

9. Input states to the simulation can be encoded in linearly many RAM steps (Section|5] Theorem IT4T>. 
10. As multitape Turing machines simulate RAMs in quadratic time |4], the thesis follows (Section^. 

2 Algorithms 

First of all, an algorithm, in its classic sense, is a time-sequential state-transition system, whose transi- 
tions are partial functions on its states. This ensures that each state is self-contained and that the next 
state, if any, is determined. The necessary information in states can be captured using logical struc- 
tures, and an algorithm is expected to be independent of the choice of representation and to produce no 
unexpected elements. Furthermore, an algorithm should possess a finite description. 

Definition 1 (Algorithm |8j). A classical algorithm is a (deterministic) state-transition system, satisfying 
the following three postulates: 

I. It is comprised of a sefls of states, a subset So C S of initial states, and a partial transition function 
T : S — S from states to states. States for which there is no transition are terminal. 

H. All states in S are (first-order) structures over the same finite vocabulary F, and X and z(X) share 
the same domain for any X € S. For convenience, we treat relations as truth-valued functions and 
refer to structures as algebras, and let tx denote the value of term t as interpreted in state xJl The 
sets of states, initial states, and terminal states are each closed under isomorphism. Moreover, 
transitions respect isomorphisms. Specifically, if X and Y are isomorphic, then either both are 
terminal or else l(X) and x(Y) are also isomorphic via the same isomorphism. 

III. There exists a fixed finite set T of critical terms over F that fully determines the behavior of 
the algorithm. Viewing any state X over F with domain D as a set of location-value pairs 
f{a\,. ..,£?„) I—)- ao, where f € F and ao,ai,...,a n £ D, this means that whenever states X and 
Y agree on T, in the sense that tx = ty for every critical term t £ T, either both are terminal states 
or else r(X)\X = r(Y)\Y . 

For detailed support for this characterization of algorithms, see [8, 6]. Clearly, we are only interested 
here in deterministic algorithms. We use the adjective "classical" to clarify that, in the current study, we 
are leaving aside new-fangled forms of algorithm, such as probabilistic, parallel or interactive algorithms. 

A classical algorithm may be thought of as a class of implementations, each computing some (partial) 
function over its state space. An implementation is determined by the choice of representation for the 
values over which the algorithm operates, which is reflected in a choice of domain. 

Definition 2 (Implementation). An implementation is an algorithm (z,S,So) restricted to a specific do- 
main D. Its states are those states S \ D with domain D; its input states So Q Sq are those initial states 
whose domain is D; its transition function % is likewise restricted. 



Or class — it doesn't matter. 
2 A11 "terms" in this paper are ground (i.e. variable-free). 
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So we may view implementations as computing a function over its domain. 

In the following, we will always assume a predefined subset / U {z} of the critical terms, called 
inputs and output, respectively. Input states may differ only on input values and input values must cover 
the whole domain. Then we may speak of an algorithm A with terminating run Xo ■ ' ' ~~*a Xn as 
computing A(yi, . . . ,yk ) = zx N - The presumption that an implementation accepts any value from its 
domain as a valid input is not a limitation, because the outcome of an implementation on undesired 
inputs is of no consequence. 

The postulates in Definition Q] limit transitions to be effective, in the sense of being programmable, 
as we just saw, but they place no constraints on the contents of initial states. In particular, initial states 
may contain infinite, uncomputable data. To preclude that, we will need an additional assumption. 

Definition 3 (Basic). We call an algebra X over vocabulary F and with domain D basic if F = K l±) J, D 
is isomorphic to the Herbrand universe (free term algebra) over K, the constructors of X, and tx = sx ^ 
UNDEF for at most a finite number of terms t and s over K 1+1 J, for some pervasive constant value UNDEF. 
An implementation is basic if all its initial states are basic with respect to the same constructors. 

Constructors are the usual way of thinking of the domain values of computational models. For exam- 
ple, strings over an alphabet {a,b,. . . } are constructed from a nullary constructor e and unary constructors 
a(-), b(-), etc. The positive integers in binary notation may be constructed out of the nullary e and unary 
and 1, with the constructed string understood as the binary number obtained by prepending the digit 1. 

Definition 4 (Effectiveness [3]). Let X be an algebra over vocabulary F and domain D. We call X 
effective over F = K\$C if K constructs D and each of the operations in C can be computed by an effective 
implementation over K. In other words, C is a set of effective oracles, obtained by bootstrapping from 
basic implementations. An effective implementation is a classical algorithm restricted to initial states that 
are all effective, over the same partitioned vocabulary F = K l±) C. 

Clearly, the properties of being basic or effective are closed under the transition of algorithm (this 
follows from Postulate III). Hence, any reachable state of basic (effective) implementation is also basic 
(effective, respectively). 

3 Complexity 

Complexity of an algorithm is classically measured as a number of single steps required by execution, 
relative to the size of the initial data. This requires an interpretation of the notions "initial data size" and 
"single step". By a "step", we usually mean a single step of some well-defined theoretical computational 
model, like a Turing machine or RAM, implementing an algorithm over a chosen representation of the 
domain. 

An effective implementation may simulate an effective algorithm over a chosen representation of 
domain, but it still cannot count for a faithful measure of a single step, since its states are allowed to 
contain infinite non-trivial information as an oracle; unlike a basic implementation. 

Basic implementations provide an underlying model for effective ones (and thus are a faithful mea- 
sure of a single step): 

Proposition 5. Let P = {t,S,Sq) be an effective implementation over K\£C. Then there exists a basic 
implementation simulating P over some vocabulary K\$J. 

The proof uses the notion of simulation defined in [2] and standard programming techniques of internal- 
izing operations by bootstrapping. 
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For example, if an effective implementation includes decimal multiplication among its bootstrapped 
operations, then we do not want to count multiplication as a single operation (which would give a 
"pseudo-complexity" measure), but, rather, the number of basic decimal-digit operations, as would be 
counted in the basic simulation of the effective implementation. 

With a notion of single step in hand, we are only left to define a suitable notion of input size. Let 
P = (x,S,Sq) be an effective implementation with constructors K. Recall from Definition [3] that the 
domain of each X € So is identified with the Herbrand universe over K. Thus, domain elements may 
be represented as terms over the constructors K. Now, we need to measure the size of input values 
y, represented as constructor terms. The standard way to do this would be to count the number of 
symbols |j| in the constructor term for y. The more conservative way is to count the minimal number of 
constructors required to access it, which we propose to do. For example, we want the size of f(c,c) to 
be 2, not 3. 

Definition 6 (Size). The (compact) size of a term t over vocabulary K is ||?|| := \{s : s is a subterm of t}. 

Still another issue to consider is this: a domain may be constructible by infinitely many different 
finite sets of constructors, which affects the measurement of size. We are accustomed to say that the 
size of n G N is lgn, relying on the binary representation of natural numbers. This, despite the fact that 
the implementation itself may use tally (unary) notation or any other representation. Consider now that 
somebody states that she has an effective implementation over N, working under the supposition that 
the size of n ought to be measured by log log n. Should this be legal? We neither allow nor reject such 
statements with blind eyes, but require justification. 

Switching representations of the domain, one actually changes the vocabulary and thus the whole 
description of the implementation. Still, we want to recognize the result as being the "same" implemen- 
tation, doing the same job, even over the different vocabularies. 

Definition 7 (Valid Size). Let A be an effective implementation over domain D. A function / : D — >■ N 
is a valid size for elements of D if there is an effective implementation B over D such that A and B are 
computationally equivalent (each simulating the other) via some bijection p, such that f(x) = \p(x) \ for 
all xeD. 

4 Abstract State Machines 

Abstract state machines (ASMs) Q [H 21 provide a perfect language for descriptions of algorithmic 
transition functions. They consist of generalized assignment statements f(s l ,...,s k ) := u, conditional 
tests if C then P or if C then P else Q, where C is a Boolean combination of equations between terms, 
and parallel composition. A program as such defines a single transition; it is executed repeatedly, as a 
unit, until no assignments have their conditions are enabled. If no assignments are enabled, then there is 
no next state. 

A triplet (^#, S, So) is called abstract state machine (ASM) if So are initial states and S are states of an 
ASM program .Ji, such that (^,S,So) satisfy the conditions for being an algorithm given in Definition 
Q] Every algorithm is emulated step-by step, state-by-state by an ASM. 

Theorem 8 ((H). Let (x,S,Sq) be an algorithm over vocabulary F. Then there exists an ASM (-#,S,So) 
over the same vocabulary, such that X = j$ \$, with the terms (and subterms) appearing in the ASM 
program serving as critical terms. 

Definition 9 (ESM). An effective state machine (ESM) is an effective implementation of an ASM 
Constructors are part and parcel of the states, though they need not appear in an ESM program. 
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5 Simulation 

We know from [3j, Theorem 3] that for any effective model there is a string-representation of its domain 
under which each effective implementation has a Turing machine that simulates it, and — by the same 
token — there are RAM simulations. Our goal is to prove that it can be done at polynomial cost. We 
describe a RAM algorithm satisfying these conditions. The result will then follow from the standard 
poly-time (cubic) connection between TMs and RAMs. First, we need to choose an appropriate RAM 
representation for our domain of terms. 

For term t, we denote the minimal graph representing it by t and the quantity of RAM memory 
required to store it by |?|. These memory cells will each contain a small constant, indicating a vertex 
label or a pointer, corresponding to an edge in the graph. Note that since t is minimal, it does not contain 
repeated factors. To prevent repeated factors not just in one term, but in the whole state, we merge the 
individual term graphs into one big graph and call the resulting "jungle" a tangle (see lPT0l ). The tangle 
will maintain the constructor-term values of all the critical terms of the algorithm. Consider, for example, 
the natural way to merge terms t = f(c,c) and s = g(c,c), where c is a constant. The resulting dag G has 
three vertices, labeled f,g,c. Two edges point from / to c and the other two from g to c. Our two terms 
may be represented as pointers to the appropriate vertex: G{t) refers to the / vertex and G(s) to g, where 
we are using the notation G(t) to refer to the vertex in G that represents the term t. 
Proposition 10. For any tangle G of terms over a finite vocabulary, we have \E(G)\ = 0(\V(G)\). 

Let (^,S,Sq) be a basic ESM over vocabulary F = K\£J, with input terms / C /, and critical terms 
T = {t , . . . ,?'"}, including all their subterms, ordered from small to big. Also, let Xo X\ • ■ • 
be some run of *dt, for which we let 7} denote the tangle of the domain values {tx, '■ t 6 T} of the critical 
terms in the z'-th state Xj. For t, a finite sequence or set of terms, we use the abbreviation ||f|| = £ jef 

One transition of ESM involves a bounded number of comparisons of the values of critical terms. 
The cost for each is constant: 

Proposition 11. Let T be a critical tangle and let s and t be critical terms in T. Therefore, the question 
whether t = 7, is decidable in constant number of RAM-operations of logarithmic word size. 

One transition of an ESM involves a bounded number of assignments. The cost of each assignment 
is linear: 

Proposition 12. Let t = f(f) be a term over vocabulary K. Then t can be constructed using 0(||F||) 
RAM-operations of logarithmic word size. 

Combining the previous propositions together, we may conclude: 
Proposition 13. The critical tangles grow by a constant amount in each step. So, |7}| = 0(|7b| + /). 
Theorem 14 ( Initial States). Given term-graphs I for the inputs I in an initial state Xo, Algorithm \J\ 
constructs the initial critical tangle Tq in 0(||/||) steps. 

Algorithm 1 

• for i = 1 , . . . , m 

- leU f = /V,.../) 

- if all s ; are defined, then 

* if / € K, create f(s l ,...,s ), as described in Proposition [T2l 

* if f£K, then 

• if found r 1 r € T such that r ; = P for all j and r = f(r l ,...,/) is defined, 
then copy the content of r to f 
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Theorem 15 (Transitions). Algorithm\2\computes the successor tangle Tj + ifrom Tj in time linear in |7J|. 
Algorithm 2 

• for each critical term t € T create a new pointer t' to point to its new value 

• for each possible assignment in the ESM, do the following: 

- if all guards evaluate to TRUE, then 

- for the enabled assignment f(s l ,...,/) := s do 

* if s is UNDEF then f'(s l , . . . , /) is also UNDEF 

* if some s' is UNDEF then f'{s l , . . . , /) is also UNDEF 

* otherwise, if / G K then 

• set f'(s l ,...,/) to point to the graph constructed as described in Proposition [T2l 

* whereas, if / ^ K, then if found r 1 ,.../,?- £ T such that ft = s> for all j and r = 
f(r l ,...,/) is denned, then 

• if f'(s\. ..,/) is not UNDEF, set f'(s 1 ,. ..,/) to point to a copy of r 



The result we set out to achieve now follows. 

Theorem 16 (Simulating ESMs). Any effective implementation with complexity T(n), with respect to a 
valid size measure, can be simulated by a RAM in order n + nT(n) + T(n) 2 steps, with a word size that 
grows to order logT(n). 



6 Summary 

We have shown — as has been conjectured — that every effective implementation, regardless of what data 
structures it uses, can be simulated by a Turing machine, with at most polynomial overhead in time 
complexity. Specifically, we have shown that any algorithm running on an effective sequential model 
can be simulated, independent of the problem, by a single-tape Turing machine with a quintic overhead: 
quadratic for the RAM simulation and another cubic for a TM simulation of the RAM 21. 

To summarize the argument in a nutshell: Any effective algorithm is behaviorally identical to an 
abstract state machine operating over a domain that is isomorphic to some Herbrand universe, and whose 
term interpretation provides a valid measure of input size. That machine is also behaviorally identical to 
one whose domain consists of maximally compact dags, labeled by constructors. Each basic step of such 
a machine, counting also the individual steps of any subroutines, increases the size of a fixed number 
of such compact dags by no more than a constant number of edges. Lastly, each machine step can be 
simulated by a RAM that manipulates those dags in time that is linear in the size of the stored dags. 

It remains to be seen whether it may be possible to improve the complexity of the simulation. 
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